Louisville’s Nonprofit Landscape

An Overview

This document walks through some of our findings as well as the steps I took to get there. I’ve got two main buckets of data for you: nonprofits by geography, and nonprofits by category.

We have some more data elements to add, though I think this is a good place to pause and circle up to make sure I’ve got things right, since there are a lot of ways to slice and dice this data.

This is a more “gritty” walkthrough than I would usually provide, and I created it this way because A) it’s easiest for me to keep the documentation and results in the same place, and B) we’re going to make some decisions together about the data, so I wanted to give you all the info I have.

(If you’re feeling particularly adventurous, you can click on the “code” buttons to the right for a quick tutorial on how to do my job. You, too, can be a data scientist–just gotta copy and paste what’s in those blocks.)

Where are nonprofits?

This section runs through the number of nonprofits by geography. I’m going to include financial info as well, but I want to check which nonprofits we want those numbers for. Probably public charities?

Gathering the data

We start by figuring out which organizations are in Louisville. We’ll use data from the IRS Business Master File (downloaded 6/9/2022). This document contains a full listing of tax-exempt organizations across the county, as well as some basic financial info (revenue, expenses, and assets).

To subset the data to the Louisville area, we start by filtering the data to organizations with a zip code within the Louisville MSA. Zip codes cross county and MSA boundaries, so some of these organizations lie outside the MSA, and we’ll need to remove them.

About 15% of organizations have a PO box as their primary address. We include them in our list if more than 50% of their zip code is within the Louisville MSA. (We calculate the percent of the zip code in the Louisville MSA as the percent of the businesses in that zip code that are in the Louisville MSA.) For those in the Louisville MSA, we assign them to the county which contains the largest amount of that zip code.

The remaining 85% of businesses have a street address. For these organizations, we find the organization’s longitude and latitude using the Census Bureau’s address geocoding service, and we fill in missing values using the private service Geocodio. We then compare their latitude and longitude to a map of counties in the Louisville are to decide whether they are in the Louisville MSA and to decide which county they are located in.

While we’re working with our map data, we also assign nonprofits to neighborhoods and Metro Council Districts. As mentioned before, zip codes are included in the original dataset.

The map below shows the original list, as defined by zip codes. We remove the organizations represented by the blue dots, keeping only those in grey within the Louisville MSA boundaries. You can mouse over the list to see the organizations’ names.

# Read in IRS Business Master files and combine into one data frame
bmf1 <- read_csv("eo1.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf2 <- read_csv("eo2.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf3 <- read_csv("eo3.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")
bmf4 <- read_csv("eo4.csv", col_types = "cccccccncnncnncnnnnnnnnnnncc")

bmf <- bind_rows(bmf1, bmf2, bmf3, bmf4)

rm(bmf1, bmf2, bmf3, bmf4)

ntee_categories <- read_csv("NTEE categories.csv", col_types = "ccc")
ntee_codes <- read_csv("NTEE codes.csv", col_types = "ccc")
subsection_codes <- read_csv("Subsection codes.csv", col_types = "ccc")

# https://github.com/Nonprofit-Open-Data-Collective/irs-exempt-org-business-master-file#activity-codes
activity_to_ntee <- readxl::read_excel("Activity to NTEE.xlsx", sheet = 2)

ntee_categories %<>%
  rename(
    NTEE_cat = `NTEE Code`)

ntee_codes %<>%
  rename(
    NTEE_code = `NTEE Code`)

activity_to_ntee %<>%
  transmute(
    ACTIVITY = str_pad(ACTIVITY, 3, "left", "0"),
    NTEE_replacement = if_else(!is.na(NTEE_replacement), NTEE_replacement, NTEE))
bmf %<>%
  mutate(
    EIN,
    name = NAME,
    street = STREET,
    city = CITY,
    state = STATE,
    zip = str_sub(ZIP, 1, 5),
    tax_year = str_sub(TAX_PERIOD, 1, 4) %>% as.numeric(),
    creation = RULING,
    filing_type = if_else(PF_FILING_REQ_CD == 1, "990PF",
                  if_else(FILING_REQ_CD %in% c(1, 3, 4), "990",
                  if_else(FILING_REQ_CD %in% c(2), "990N",
                  "other"))),
    revenue = REVENUE_AMT,
    assets = ASSET_AMT,
    activity_codes = ACTIVITY,
    NTEE_code = NTEE_CD,
    NTEE_cat = str_sub(NTEE_CD, 1, 1))
# MSA_zip_crosswalk <- glptools::MSA_zip %>%
#   filter(MSA == "31140") %>%
#   mutate(business_in_county = round(business_in_county, 1))

MSA_zip_crosswalk <- glptools::MSA_zip %>%
  filter(MSA == "31140")

bmf_lou <- bmf %>% 
  left_join(MSA_zip_crosswalk, by = "zip") %>%
  filter(!is.na(MSA)) %>%
  select(EIN, street, city, state, zip)

# bmf_sure <- bmf %>%
#   filter(business_in_county == 100) %>%
#   mutate(lou_msa = "sure")
# 
# bmf_check <- bmf %>%
#   filter(business_in_county > 0, business_in_county < 100) %>%
#   mutate(lou_msa = "unsure")
# 
# output <- bind_rows(bmf_sure, bmf_check)

#write_csv(output, "Louisville Nonprofits 6_9_2022.csv")

# Remove PO Boxes as they will not be addressed
bmf_lou_po <- bmf_lou %>%
  filter(str_detect(street, "PO BOX|PO B0X"))

bmf_lou_remaining <- bmf_lou %>%
  anti_join(bmf_lou_po, by = c("EIN", "street", "city", "state", "zip"))

bmf_lou_po %<>% 
  left_join(MSA_zip_crosswalk, by = "zip") %>%
  filter(business_in_county >= 50) %>%
  select(EIN, street, city, state, zip)

# Geocode using the Census Bureau geocoder
bmf_lou_census <- bmf_lou_remaining %>%
  geocode(
    street = street,
    city = city,
    state = state,
    postalcode = zip,
    method = "census")

bmf_lou_remaining <- bmf_lou_census %>%
  filter(is.na(lat)) %>%
  select(-lat, -long)

bmf_lou_census %<>%
  filter(!is.na(lat))

# Geocode using the geocodio geocoder

Sys.setenv(GEOCODIO_API_KEY = "cccff3c3cc3aca633fc09ccc3901c1a861a9069")

bmf_lou_geocodio <- bmf_lou_remaining %>%
  geocode(
    street = street,
    city = city,
    state = state,
    postalcode = zip,
    method = "geocodio")

# Bind geocoded information and save
bmf_lou_geocoded <- bind_rows(bmf_lou_census, bmf_lou_geocodio, bmf_lou_po)

bmf_lou_geocoded %<>%
  select(EIN, lat, long)

save(bmf_lou_geocoded, file = "bmf_lou_geocoded.RData")
load("bmf_lou_geocoded.RData")

# For PO boxes, assign EINS to the county that contains the majority of the zip code
bmf_lou_POs <- bmf_lou_geocoded %>%
  filter(is.na(lat)) %>%
  left_join(bmf, by = "EIN") %>%
  select(EIN, zip) %>%
  left_join(FIPS_zip_full_MSA, by = "zip") %>%
  group_by(EIN) %>%
  arrange(desc(business_in_county)) %>%
  filter(row_number() == 1) %>%
  ungroup() %>%
  select(EIN, FIPS)
  
# For other addresses, create sf object and join with an MSA and neighborhood map
bmf_lou_addressed <- bmf_lou_geocoded %>%
  filter(!is.na(lat)) %>%
  st_as_sf(coords = c("long", "lat"), crs = 4326)

bmf_lou_county <- bmf_lou_addressed %>%
  st_join(map_msa_lou, st_within)

bmf_lou_nh <- bmf_lou_addressed %>%
  st_join(map_nh, st_within) %>%
  st_drop_geometry()

bmf_lou_district <- bmf_lou_addressed %>%
  st_join(map_district, st_within) %>%
  st_drop_geometry()
pal <- colorFactor(palette = c("#00a9b7FF", "#333333"),
                   domain = c("Keep", "Drop")) 

labels <- sprintf("%s",
                  left_join(bmf_lou_county, bmf, by = "EIN")$name) %>%
        lapply(htmltools::HTML)

leaflet() %>%
  addPolygons(data = map_msa_lou, 
              color = "black",
              opacity = 1,
              weight = 3,
              fillOpacity = 0) %>%
  addCircleMarkers(data = mutate(bmf_lou_county, keep = if_else(is.na(FIPS), "Drop", "Keep")),
                   radius = 2,
                   color = ~pal(keep),
                   label = labels,
      labelOptions = labelOptions(
        style = list("font-weight" = "normal", "font-family" = "Arial", padding = "3px 8px"),
        textsize = "15px",
        direction = "auto")) %>%
  addLegend(pal = pal, values = c("Drop", "Keep")) %>%
  addTiles() %>%
  setView(-85.6, 38.2, zoom = 8)
rm(pal)
# Filter to EINs in Lou MSA and remove county polygons from the data frame
bmf_lou_county %<>%
  st_drop_geometry() %>%
  filter(!is.na(FIPS)) %>%
  select(-county)

# Add neighborhood names to the data frame
bmf_lou_county %<>%
  left_join(bmf_lou_nh, by = "EIN") %>%
  left_join(bmf_lou_district, by = "EIN")

# Add PO box data 
bmf_lou_county %<>%
  bind_rows(bmf_lou_POs)

bmf_lou <- bmf_lou_county

bmf_lou_points <- bmf_lou_addressed %>%
  filter(EIN %in% bmf_lou$EIN)

rm(bmf_lou_geocoded, bmf_lou_county, bmf_lou_nh, bmf_lou_district, bmf_lou_POs, bmf_lou_addressed)

By County

Jefferson County has the largest number of nonprofits, though Harrison County happens to have the most per capita! You can mouse over the map below to see the numbers for each county. Click on the “Table” tab to see the data in a table format and to download the information.

Map

county_pop_df <- glpdata::population_msa_counties %>%
  filter(year == 2019,
         sex == "total", 
         race == "total")

county_df <- bmf_lou %>%
  group_by(FIPS) %>%
  count() %>%
  ungroup() %>%
  left_join(county_pop_df, by = "FIPS") %>%
  left_join(MSA_FIPS_info, by = "FIPS") %>%
  transmute(
    County = county %p% if_else(str_sub(FIPS, 1, 2) == "18", 
                                ", IN", ", KY"),
    Nonprofits = n,
    `Nonprofits per 1000 Residents` = round(n / population * 1000, 1))

county_df %<>%
  mutate(
    county = str_extract(County, "^.*(?= County)")) %>%
  left_join(map_msa_lou, by = "county")

pal <- leaflet::colorNumeric(
  palette = RColorBrewer::brewer.pal(9, "BuPu"),
  domain = range(county_df$`Nonprofits per 1000 Residents`))

labels <- sprintf("%s<br/>%s<br/>%s",
                  county_df$County,
                  "Nonprofit HQs per 1,000 residents: " %p% county_df$`Nonprofits per 1000 Residents`,
                  "Total Nonprofits: " %p% county_df$`Nonprofits`) %>%
        lapply(htmltools::HTML)

leaflet(st_as_sf(county_df)) %>%
  addPolygons(
      color = "#444444", weight = 1, 
      smoothFactor = 0.5, opacity = 1.0, fillOpacity = 0.5,
      fillColor = ~pal(`Nonprofits per 1000 Residents`),
      label = labels,
      labelOptions = labelOptions(
        style = list("font-weight" = "normal", "font-family" = "Arial", padding = "3px 8px"),
        textsize = "15px",
        direction = "auto")) %>%
  addTiles

Table

county_df %<>%
  select(County, Nonprofits, `Nonprofits per 1000 Residents`)

county_df %>%
  downloadthis::download_this(
    output_name = "Nonprofits by County",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save",
    csv2 = FALSE)
apply_table_settings <- function(table) {
  table %>%
    opt_row_striping(row_striping = TRUE) %>%
    opt_table_outline() %>%
    tab_options(
      table.font.size = px(12),
      table.width = pct(50)) %>%
    tab_style(
      cell_text(
        weight = "bold"), 
      cells_row_groups())
}

county_df %>%
  gt() %>%
  fmt_integer(columns = c(Nonprofits)) %>%
  tab_header(title = "Nonprofits by county",
             subtitle = "") %>%
  apply_table_settings()
Nonprofits by county
County Nonprofits Nonprofits per 1000 Residents
Clark County, IN 485 4.1
Floyd County, IN 373 4.8
Harrison County, IN 239 6.0
Washington County, IN 139 5.0
Bullitt County, KY 243 3.0
Henry County, KY 63 4.0
Jefferson County, KY 4,554 5.9
Oldham County, KY 310 4.7
Shelby County, KY 216 4.6
Spencer County, KY 77 4.2

By Metro Council District

Unsurprisingly (to me, at least) Downtown has the largest concentration of nonprofit headquarters. Since Districts have approximately the same number of residents and I didn’t have the data on hand, I did not include per capita figures.

Map

council_df <- bmf_lou %>%
  group_by(district) %>%
  count() %>%
  ungroup() %>%
  filter(!is.na(district)) %>%
  transmute(
    `Council District` = district,
    Nonprofits = n)

council_df %<>%
  left_join(map_district, by = c("Council District" = "district"))

pal <- leaflet::colorNumeric(
  palette = RColorBrewer::brewer.pal(9, "BuPu"),
  domain = range(council_df$Nonprofits))

labels <- sprintf("%s<br/>%s",
                  "District " %p% council_df$`Council District`,
                  "Nonprofit HQs: " %p% council_df$Nonprofits) %>%
        lapply(htmltools::HTML)

leaflet(st_as_sf(council_df)) %>%
  addPolygons(
      color = "#444444", weight = 1, 
      smoothFactor = 0.5, opacity = 1.0, fillOpacity = 0.5,
      fillColor = ~pal(Nonprofits),
      label = labels,
      labelOptions = labelOptions(
        style = list("font-weight" = "normal", "font-family" = "Arial", padding = "3px 8px"),
        textsize = "15px",
        direction = "auto")) %>%
  addTiles

Table

council_df %<>%
  select(`Council District`, Nonprofits)

council_df %>%
  downloadthis::download_this(
    output_name = "Nonprofits by Metro Council District",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save",
    csv2 = FALSE)
council_df %>%
  gt() %>%
  fmt_integer(columns = c(Nonprofits)) %>%
  tab_header(title = "Nonprofits by Council District",
             subtitle = "") %>%
  apply_table_settings()
Nonprofits by Council District
Council District Nonprofits
1 104
2 76
3 95
4 623
5 133
6 207
7 261
8 194
9 209
10 161
11 127
12 77
13 69
14 53
15 88
16 174
17 116
18 212
19 243
20 137
21 96
22 90
23 93
24 59
25 70
26 121

By Neighborhood

Map

nh_pop_df <- glpdata::population_nh %>%
  filter(year == 2019,
         sex == "total", 
         race == "total") %>%
  select(neighborhood, population)

nh_df <- bmf_lou %>%
  group_by(neighborhood) %>%
  count() %>%
  ungroup() %>%
  left_join(nh_pop_df, by = "neighborhood") %>%
  transmute(
    Neighborhood = neighborhood,
    Nonprofits = n,
    `Nonprofits per 1000 Residents` = round(n / population * 1000, 1))


make_map(rename(nh_df, neighborhood = Neighborhood), 
         "Nonprofits",
         hover_name = "Nonprofit HQ",
         units = "none")

Table

nh_df %>%
  downloadthis::download_this(
    output_name = "Nonprofits by Neighborhood",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save",
    csv2 = FALSE)
nh_df %>%
  gt() %>%
  fmt_integer(columns = c(Nonprofits)) %>%
  tab_header(title = "Nonprofits by Council District",
             subtitle = "") %>%
  apply_table_settings()
Nonprofits by Council District
Neighborhood Nonprofits Nonprofits per 1000 Residents
Airport 7 NA
Algonquin-Park Hill-Park Duvalle 44 2.8
Buechel-Newburg-Indian Trail 115 3.3
Butchertown-Clifton-Crescent Hill 147 6.5
California-Parkland 69 7.8
Chickasaw-Shawnee 86 4.8
Downtown-Old Louisville-University 505 28.9
Fairdale 17 1.2
Fern Creek 67 2.4
Floyd's Fork 228 4.6
Germantown 77 5.9
Highlands 151 7.1
Highview-Okolona 170 2.6
J-Town 318 5.8
Northeast Jefferson 736 6.2
Phoenix Hill-Smoketown-Shelby Park 163 15.9
Pleasure Ridge Park 88 2.1
Portland 59 5.8
Russell 52 5.0
Shively 116 3.7
South Central Louisville 79 3.2
South Louisville 154 3.0
Southeast Louisville 238 4.2
St. Matthews 142 6.8
Valley Station 60 2.1
NA 2,811 NA

By Zip Code

(The table here is pretty ungainly to look at in my opinion, but you can download it.)

zip_pop_df <- glpdata::population_zip %>%
  filter(year == 2019,
         sex == "total", 
         race == "total") %>%
  select(zip, population_total) %>%
  distinct()

zip_df <- bmf_lou %>%
  left_join(bmf, by = "EIN") %>%
  group_by(zip) %>%
  count() %>%
  ungroup() %>%
  left_join(zip_pop_df, by = "zip") %>%
  transmute(
    `Zip Code` = zip,
    Nonprofits = n,
    `Nonprofits per 1000 Residents` = 
      round(n / population_total * 1000, 1))

zip_df %>%
  downloadthis::download_this(
    output_name = "Nonprofits by Zip Code",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save",
    csv2 = FALSE)
# zip_df %>%
#   gt() %>%
#   fmt_integer(columns = c(Nonprofits)) %>%
#   tab_header(title = "Nonprofits by Zip Code",
#              subtitle = "") %>%
#   apply_table_settings()

make_map(rename(zip_df, zip = `Zip Code`), 
         "Nonprofits",
         hover_name = "Nonprofit HQ",
         units = "none")

Classifying organizations

Deciding which organizations fit where is complicated, to say the least. We have a couple ways of approaching it:

  • IRS subsection - Are you a 501(c)(3), a 501(c)(4), etc.?
  • Activity and NTEE codes - Application for nonprofit status ask organizations to classify their work. Until the mid-1990s, the IRS used a list of 266 “activity codes.” Organizations could list 1-3 activity codes. Beginning in the mid-1990s, the IRS began using the NTEE classification system, a revised set of categories with more uniformity and consistency. NTEE codes are hierarchical, with 26 top-level groups (e.g. Education, Health Care, Housing & Shelter), 211 sub-categories (e.g. PreK-12 Education, Hospitals, Housing Creation and Rehabilitation), and 655 lower-level groups (e.g. Preschools, Senior Citizens Housing & Retirement Communities, Specialty Hospitals). I refer to the 26 top-level groups as NTEE categories, the 211 middle-level groups as NTEE sub-categories, and the 655 specific groups as NTEE codes.
    • NTEE codes are composed of a letter and a two-digit number, like (e.g. B21). The letter indicates the NTEE category, the first digit indicates the sub-category, and the second digit indicates the code. For example, B represents education. Some codes within education are:
      • The B10s: organizations that fundraiser for educational organizations. B11 includes nonprofits that support a single organization, while B12 includes nonprofits that support multiple organizations.
      • The B20s are prek-grade 12 schools. (E.g. B21, B24, B25, B28, B29.)
      • B30 is vocational schools.
      • The B40s are colleges and universities of all kinds.
      • Etc.
    • Many codes are common across categories. For example, B11 represents Single Organization Support for educational nonprofits (the B category), while C11 represents Single Organization Support for environmental nonprofits (the C category).
    • Going from activity codes to NTEE codes is possible, but not precise. While we can generally connect activity codes to top-level NTEE codes, activity codes often relate to more than one specific NTEE code.)
  • Foundation classification - by default, all 501(c)(3)s are classified as “private foundations” unless they demonstrate that they are not, which makes them a “public charity”.
bmf_lou %<>%
  left_join(bmf, by = "EIN")

# Change AMVETS from subsection 91 to 19.
bmf_lou$SUBSECTION[bmf_lou$name == "AMVETS"] <- "19"

# Change Thornton to employee benefit trust
bmf_lou$SUBSECTION[bmf_lou$EIN == "611040959"] <- "09"

# Classify 501(c)(3) organizations as foundations, Churches/Religious orgs, Government entities, or charitable nonprofits
bmf_lou %<>%
  left_join(subsection_codes, by = "SUBSECTION") %>%
  mutate(
    org_type = case_when(
      SUBSECTION != "03" ~ `short name`,
      PF_FILING_REQ_CD == 1 ~ "Foundation",
      FILING_REQ_CD %in% c(6, 13) ~ "Church or Religious Organization",
      FILING_REQ_CD == 14 ~ "Government Entities",
      # other FILING_REQ_CD: 1, 2, 3
      FILING_REQ_CD %in% 1:3 ~ "Charitable Nonprofit",
      FOUNDATION %in% c(10) ~ "Church or Religious Organization",
      FOUNDATION %in% c(4, 11, 16, 17) ~ "Foundation",
      FOUNDATION %in% c(15) ~ "Government Entities"))

# Replace missing NTEE codes with an activity code to NTEE crosswalk
bmf_lou %<>%
  mutate(activity_code_1 = str_sub(ACTIVITY, 1, 3)) %>%
  left_join(activity_to_ntee, by = c("activity_code_1" = "ACTIVITY")) %>%
  mutate(
    NTEE_code = if_else(is.na(NTEE_code), NTEE_replacement, NTEE_code),
    NTEE_cat = str_sub(NTEE_code, 1, 1))

# Add NTEE1 and NTEE2
bmf_lou %<>%
  left_join(ntee_categories, by = "NTEE_cat") %>%
  mutate(NTEE1 = Description) %>%
  select(-Description, -Definition) %>%
  left_join(ntee_codes, by = "NTEE_code") %>%
  mutate(NTEE2 = Description)%>%
  select(-Description, -Definition) %>%
  mutate(NTEE2 = if_else(!is.na(NTEE1) & is.na(NTEE2), "uncategorized", NTEE2))
library(sunburstR)
library(d3r)

df <- bmf_lou_classified %>%
  select(org_type, NTEE1, NTEE2, name, revenue) %>%
  mutate(
    org_type = str_replace(org_type, "-", " "),
    NTEE1 = str_replace(NTEE1, "-", " "),
    NTEE2 = str_replace(NTEE2, "-", " "),
    name = str_replace(name, "-", " "),
    revenue = if_else(revenue < 0, 0, revenue)) %>%
  mutate(org_type = replace_na(org_type, "uncategorized"),
         NTEE1 = replace_na(NTEE1, "uncategorized")) %>%
  count(org_type, NTEE1, NTEE2, name) %>%
  rename(size = n)

tree <- d3_nest(df, value_cols = "size")

sunburst(tree, width="100%", height=600, legend = FALSE)

sund2b(tree, width="100%")
org_type1 <- bmf_lou_classified %>%
  group_by(org_type) %>%
  summarize(n = n(), .groups = "drop")

org_type2 <- bmf_lou_classified %>%
  filter(org_type == "Charitable Nonprofit") %>%
  group_by(NTEE1) %>%
  summarize(n = n(), .groups = "drop")

org_type3 <- bmf_lou_classified %>%
  filter(org_type == "Charitable Nonprofit") %>%
  group_by(NTEE2) %>%
  summarize(n = n(), .groups = "drop") %>%
  mutate(NTEE_cat = str_sub(NTEE2, 1, 1)) %>%
  left_join(ntee_categories, by = "NTEE_cat") 


org_type1 <- bmf_lou_classified %>%
  group_by(org_type) %>%
  summarize(n = sum(revenue, na.rm = TRUE), .groups = "drop")

org_type2 <- bmf_lou_classified %>%
  filter(org_type == "Charitable Nonprofit") %>%
  group_by(NTEE1) %>%
  summarize(n = sum(revenue, na.rm = TRUE), .groups = "drop")

org_type3 <- bmf_lou_classified %>%
  filter(org_type == "Charitable Nonprofit") %>%
  group_by(NTEE2) %>%
  summarize(n = sum(revenue, na.rm = TRUE), .groups = "drop")

Subsection codes

Our first way of separating out types of nonprofits is to look at thir subsection codes. We are primarily concerned with 501(c)(3)s, or charitable nonprofits, and all information from here on out will focus on those.

subsection_df <- bmf_lou %>%
  group_by(SUBSECTION, `short name`) %>%
  count() %>%
  ungroup() %>%
  transmute(
    Subsection = SUBSECTION,
    `Subsection name` = `short name`,
    `Nonprofits` = n)

subsection_df %>%
  downloadthis::download_this(
    output_name = "Nonprofits by IRS Subsection",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save",
    csv2 = FALSE)
subsection_df %>%
  gt() %>%
  fmt_integer(columns = c(`Nonprofits`)) %>%
  tab_header(title = "Nonprofits by county",
             subtitle = "") %>%
  apply_table_settings()
Nonprofits by county
Subsection Subsection name Nonprofits
01 Credit Unions 4
02 Title-holding Organization 10
03 Charitable Nonprofits 5,406
04 Civic Associations 259
05 Labor and Agricultural Organizations 114
06 Business Associations 235
07 Social Clubs and Greek Organizations 248
08 Fraternal Societies 184
09 Employee Benefit Trusts 15
10 Fraternal Societies 53
12 Utilities 16
13 Cemeteries 54
14 Credit Unions 7
15 Mutual Insurance 1
19 Veteran Support Organizations 82
92 Trusts 11

Foundation Codes

501(c)(3)s are further grouped by foundation codes. The biggest distinction this creates is between private foundations and publicly-supported charities. At this stage, we classify organizations into the following groups: * Foundations, * Churches * Schools * Hospital and Medical Research Organizations, * Government Entities * Public Charities

bmf_lou %<>%
  mutate(
    org_type = case_when(
      SUBSECTION != "03" ~ `short name`,
      FOUNDATION %in% c(2:4, 21) ~ "Foundation",
      FOUNDATION == 10 ~ "Church or Religious Organization",
      FOUNDATION %in% c(11, 13, 23) ~ "School",
      FOUNDATION %in% c(12, 22, 24) ~ "Hospital or Medical Research Org",
      FOUNDATION == 14 ~ "Government Entity",
      FOUNDATION %in% 15:17 ~ "Public charity"))
      # other FILING_REQ_CD: 1, 2, 3
      # FILING_REQ_CD == 13 ~ "Church or Religious Organization",
      # FILING_REQ_CD == 14 ~ "Government Entities"))

bmf_lou_foundation <- bmf_lou %>%
  filter(SUBSECTION == "03") %>%
  group_by(org_type) %>%
  summarize(
    number = n(),
    revenue = sum(revenue, na.rm = T),
    missing_revenue = mean(is.na(REVENUE_AMT)),
  ) %>%
  ungroup() %>%
  transmute(
    `Organization Type` = org_type,
    Number = number,
    `Total Revenue` = revenue,
    `Percent missing revenue` = missing_revenue)

bmf_lou_foundation %>%
  downloadthis::download_this(
    output_name = "501(c)(3)s by foundation code",
    output_extension = ".csv",
    button_label = "Download data",
    button_type = "warning",
    has_icon = TRUE,
    icon = "fa fa-save",
    csv2 = FALSE)
bmf_lou_foundation %>%
  gt() %>%
  apply_table_settings() %>%
  fmt_currency(columns = vars(`Total Revenue`),
               use_subunits = F) %>%
  fmt_percent(columns = vars(`Percent missing revenue`),
              decimals = 0) %>%
  fmt_integer(columns = c(`Number`)) %>%
  tab_header(title = "501(c)(3)s by Foundation Code",
             subtitle = "") 
501(c)(3)s by Foundation Code
Organization Type Number Total Revenue Percent missing revenue
Church or Religious Organization 943 $22,941,639 95%
Foundation 458 $15,774,977 97%
Government Entity 2 $0 50%
Hospital or Medical Research Org 51 $6,848,285,186 47%
Public charity 3,872 $4,096,102,738 12%
School 80 $456,599,867 15%

NTEE Codes

Having drilled down to something close to public charities, the next logical question is what is going on in the public charities / social services sector?

Table

This table shows our public charities broken out by NTEE code. You can click on the arrow next to each title top open a sub-table with more detailed information.

Many organizations founded before the mid-90s are classified into a NTEE Category, but we are unsure which detailed code they belong to, so they end up in the “uncaegorized” category.

bmf_lou_table <- bmf_lou %>%
  filter(SUBSECTION == "03", org_type == "Public charity") %>%
  group_by(NTEE1) %>%
  summarize(
    number = n(),
    revenue = sum(revenue, na.rm = T),
    missing_revenue = mean(is.na(REVENUE_AMT)),
  ) %>%
  ungroup() %>%
  transmute(
    `NTEE Category` = NTEE1,
    Number = number,
    Revenue = revenue,
    `Pct missing revenue` = missing_revenue)

library(reactable)

bmf_lou_table %>%
  reactable(
    
    columns = list(
      `NTEE Category` = colDef(format = colFormat()),
      Number = colDef(format = colFormat(separators = TRUE)),
      Revenue = colDef(format = colFormat(prefix = "$", separators = TRUE, digits = 0)),
      `Pct missing revenue` = colDef(format = colFormat(percent = TRUE, digits = 1))),
    
    details = function(index) {
      
    this_data <- bmf_lou %>%
      filter(SUBSECTION == "03", org_type == "Public charity", NTEE1 == bmf_lou_table$`NTEE Category`[index]) %>%
      group_by(NTEE2) %>%
      summarize(
        number = n(),
        revenue = sum(revenue, na.rm = T),
        missing_revenue = mean(is.na(REVENUE_AMT))) %>%
        transmute(
          `NTEE Category` = NTEE2,
          Number = number,
          Revenue = revenue,
          `Pct missing revenue` = missing_revenue)
      
    htmltools::div(style = "padding: 1rem",
      reactable(this_data, outlined = TRUE,
        columns = list(
          `NTEE Category` = colDef(format = colFormat()),
          Number = colDef(format = colFormat(separators = TRUE)),
          Revenue = colDef(format = colFormat(prefix = "$", separators = TRUE, digits = 0)),
          `Pct missing revenue` = colDef(format = colFormat(percent = TRUE, digits = 1)))))
  },
  defaultPageSize = 30)

Interactive

This interactive lets you drill down through the 26 NTEE categories and 655 NTEE codes to see organizations.

By Number

This graph includes all 3,693 public charities in Louisville. The inner-most purple ring represents public charities, the next ring shows the 26 NTEE categories, the third ring shows all of the 655 NTEE codes present in Louisville, and the final ring shows individual organizations.

A couple of organizations are represented several times in the data under different EIN numbers–something for me to look into.

The “breadcrum” trail in the upper right shows you what percentage of local nonprofits are in the category you mouse over. As you click on the rings, it will zoom to that particular ring. (To Zoom out, click on a ring in the middle that you want to Zoom back out to.)

library(sunburstR)
library(d3r)

df <- bmf_lou %>%
  filter(SUBSECTION == "03", org_type == "Public charity") %>%
  select(NTEE1, NTEE2, name, revenue) %>%
  mutate(
    NTEE1 = str_replace(NTEE1, "-", " "),
    NTEE2 = str_replace(NTEE2, "-", " "),
    name = str_replace(name, "-", " "),
    revenue = if_else(revenue < 0, 0, revenue)) %>%
  mutate(NTEE1 = replace_na(NTEE1, "uncategorized")) %>%
  group_by(NTEE1, NTEE2) %>%
  count(NTEE1, NTEE2, name) %>%
  rename(size = n) %>%
  ungroup()

tree <- d3_nest(df, value_cols = "size")


sund2b(tree, width="100%", rootLabel = "Public Charities", 
       colors = htmlwidgets::JS("d3.scaleOrdinal(d3.schemeCategory20b)"))
By Revenue

This graph is similar to the last, though each organization’s size is relative to its’ last reported revenue. As a result, some individual organizations like the UL Research Foundation, Norton Healthcare, or UL Physicians Group make up an outsize proportion of the graph.

The “breadcrum” trail now shows the sum of reported revenue for these organizations: over $4 billion as of their last filings.

library(sunburstR)
library(d3r)

df <- bmf_lou %>%
  filter(SUBSECTION == "03", org_type == "Public charity") %>%
  select(NTEE1, NTEE2, name, revenue) %>%
  mutate(
    NTEE1 = str_replace(NTEE1, "-", " "),
    NTEE2 = str_replace(NTEE2, "-", " "),
    name = str_replace(name, "-", " "),
    revenue = if_else(revenue < 0, 0, revenue)) %>%
  mutate(NTEE1 = replace_na(NTEE1, "uncategorized")) %>%
  group_by(NTEE1, NTEE2, name) %>%
  summarize(revenue = sum(revenue, na.rm = T), .groups = "drop")

tree <- d3_nest(df, value_cols = "revenue")

sund2b(tree, width="100%", rootLabel = "Public Charities", valueField = "revenue",
       colors = htmlwidgets::JS("d3.scaleOrdinal(d3.schemeCategory20b)"))